March 15, 2017
These slides: http://www.databrew.cc/cism
Reason 1: It's free
Reason 2: It's "open source"
Reason 3: It's beautiful
Reason 3: It's beautiful
Reason 3: It's beautiful
Reason 3: It's beautiful
Reason 4: It's powerful
Reason 5: It's fun
Download R: https://www.r-project.org/
Download RStudio: https://www.rstudio.com/products/rstudio/download/
Let's write some code!
2 + 2
Let's write some code!
2 + 2
[1] 4
Let's write some code!
x <- c(1,2,3,4,5)
Let's write some code!
x
[1] 1 2 3 4 5
Let's write some code!
barplot(x)
A "package" is simply a collection of code written by someone else.
It's what makes R powerful, but also confusing.
You only have to install a package one time.
install.packages('dplyr')
install.packages('devtools')
devtools::install_github('databrew/databrew')
devtools::install_github('joebrew/cism')
You have to use the library function every time you use a package.
library(databrew) library(cism) library(sp)
Writing library just means "I am going to use this package".
Since we've already written library(cism), now we can use some tools from the cism package.
plot(moz0)
plot(man3)
a <- 1 a + 3
a <- 1 a + 3
[1] 4
Let's create an object called "ages", with the age of everyone
ages <- c()
How do we view our ages object?
ages
How do we view our ages object?
ages
[1] 30 26 31 39 45 27 28 22 19 30 35
How do we view just the first element of our ages object?
ages[1]
How do we view just the first element of our ages object?
ages[1]
[1] 30
How do we sort our ages object?
sorted_ages <- sort(ages)
sorted_ages
[1] 19 22 26 27 28 30 30 31 35 39 45
How do we get the minimum, maximum, average age?
min(ages) max(ages) mean(ages)
min(ages)
[1] 19
max(ages)
[1] 45
mean(ages)
[1] 30.18182
How do we visualize our ages object?
hist(ages)
Previously, we looked at a one dimensional object: ages.
But most data is two dimensional: rows and columns.
This is called a data frame.
Let's play around with some real data.
Let's create a simple dataframe
www.databrew.cc/frangos.csv
frangos <- databrew::frangos
head(frangos)
# A tibble: 6 × 4 diet chick days grams <chr> <int> <dbl> <int> 1 corn 1 0.1916719 42 2 corn 1 1.0106406 51 3 corn 1 4.5217673 59 4 corn 1 6.7225206 64 5 corn 1 8.1383321 76 6 corn 1 9.1120955 93
Let's explore.
Brackets: []
Always save your scripts.
Never save your "workspace".
Work in "projects"
We're going to use the cism package to get weather data for the FQMA weather station (Maputo).
library(cism)
??get_weather
weather <- get_weather(station = 'FQMA',
start_year = 2010,
end_year = 2016)
Now that we have our weather data, we can look at it.
head(weather)
Now that we have our weather data, we can look at it.
head(weather)
date temp_max temp_mean temp_min humidity_max humidity_mean 1 2010-01-01 34 30 26 94 66 2 2010-01-02 31 27 24 89 72 3 2010-01-03 32 28 24 94 79 4 2010-01-04 31 26 21 100 84 5 2010-01-05 25 23 21 100 82 6 2010-01-06 28 24 20 83 69 humidity_min precipitation cloud_cover location 1 52 0 2 FQMA 2 55 0 5 FQMA 3 55 0 6 FQMA 4 58 0 6 FQMA 5 65 0 6 FQMA 6 54 0 3 FQMA
# 1. How many rows are in our data? nrow(weather) # 2. How many columns? ncol(weather) # 3. What are the names of the columns? colnames(weather)
# 1. How many rows are in our data? nrow(weather)
[1] 2302
# 2. How many columns? ncol(weather)
[1] 10
# 3. What are the names of the columns? colnames(weather)
[1] "date" "temp_max" "temp_mean" "temp_min" [5] "humidity_max" "humidity_mean" "humidity_min" "precipitation" [9] "cloud_cover" "location"
# 4. What is the date range? range(weather$date) # 5. What is the maximum temperature? max(weather$temp_max) # 6. What is the minimum temperature? min(weather$temp_min) # 7. What is the average temperature? mean(weather$temp_mean)
# 4. What is the date range? range(weather$date)
[1] "2010-01-01" "2016-12-12"
# 5. What is the maximum temperature? max(weather$temp_max, na.rm = TRUE)
[1] 44
# 6. What is the minimum temperature? min(weather$temp_min, na.rm = TRUE)
[1] 7
# 7. What is the average temperature? mean(weather$temp_mean, na.rm = TRUE)
[1] 23.84982
Which variables do we have which are numeric and continuous?
How can we visualize these?
Which variables do we have which are numeric and continuous?
temp_max, temp_mean, temp_min, etc…How can we visualize these?
boxplot(weather$temp_mean)
hist(weather$temp_mean)
Let's create a variable called "hot"
weather$hot <- ifelse(weather$temp_max > 30, 'hot', 'not hot')
head(weather)
head(weather)
date temp_max temp_mean temp_min humidity_max humidity_mean 1 2010-01-01 34 30 26 94 66 2 2010-01-02 31 27 24 89 72 3 2010-01-03 32 28 24 94 79 4 2010-01-04 31 26 21 100 84 5 2010-01-05 25 23 21 100 82 6 2010-01-06 28 24 20 83 69 humidity_min precipitation cloud_cover location hot 1 52 0 2 FQMA hot 2 55 0 5 FQMA hot 3 55 0 6 FQMA hot 4 58 0 6 FQMA hot 5 65 0 6 FQMA not hot 6 54 0 3 FQMA not hot
table(weather$hot) hot_table <- table(weather$hot) hot_prop_table <- prop.table(hot_table)
hot_table <- table(weather$hot) hot_prop_table <- prop.table(hot_table) barplot(hot_table)
barplot(hot_table,
main = 'Hot days in Maputo')
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days')
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature')
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature',
col = c('red', 'blue'))
barplot(hot_table,
main = 'Hot days in Maputo',
ylab = 'Number of days',
xlab = 'Temperature',
col = c('red', 'blue'),
border = 'darkgrey')
Let's create a plot of date (x-axis) and the maximum temperature
Let's create a plot of date (x-axis) and the maximum temperature
plot(weather$date,
weather$temp_max)
Let's make our plot prettier
Let's make our plot prettier
plot(weather$date,
weather$temp_max,
type = 'l',
col = 'red',
xlab = 'Date',
ylab = 'Maximum temperature',
main = 'Maximim temperature in Maputo')
We're going to analyze where Joe is, using data from google. The data is part of the databrew package.
# Load package library(databrew) # Get data joe <- joe
Let's have a look at the structure of our data.
head(joe)
date time longitude latitude velocity altitude 1 2017-03-13 2017-03-13 11:08:06 32.79699 -25.40760 NA NA 2 2017-03-13 2017-03-13 11:06:01 32.79699 -25.40760 NA NA 3 2017-03-13 2017-03-13 11:05:32 32.80439 -25.40608 NA NA 4 2017-03-13 2017-03-13 11:03:03 32.80439 -25.40608 NA NA 5 2017-03-13 2017-03-13 11:01:03 32.80545 -25.40844 NA NA 6 2017-03-13 2017-03-13 11:00:16 32.80545 -25.40779 NA NA heading accuracy 1 NA 2500 2 NA 2500 3 NA 1899 4 NA 1899 5 NA 400 6 NA 699
Let's filter our data so that it only contains observations for the period from March 7-13.
joe_filtered <- joe[joe$date >= '2017-03-07' &
joe$date <= '2017-03-13',]
Now let's use the cism package to plot Manhiça.
library(cism) library(sp) manhica <- man3 plot(manhica)
The databrew package has a nice function called visualize_location. Let's try it out
?visualize_location
visualize_location(x = joe_filtered,
spdf = manhica)
Let's also try with an interactive map
visualize_location(x = joe_filtered,
use_leaflet = TRUE)